Entry Name: “PSU-Zhong-MC3

VAST 2013 Challenge
Mini-Challenge 3: Visual Analytics for Network Situation Awareness

 

 

Team Members:

Chen Zhong, Mingyi Zhao, Gaoyao Xiao, Jun Xu

Pennsylvania State University

czz111@psu.edu PRIMARY

muz127@psu.edu

gzx102@psu.edu

junxzm@gmail.com

 

Student Team:  YES

 

Analytic Tools Used:

D3

Google BigQuery

Excel

Mind42

ARSCA, an analytical reasoning support tool for Cyber Analysis, developed by S2 Research Lab, PSU http://s2.ist.psu.edu/paper/paper42-Zhong-ISI2013-final.pdf

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2013 is complete? YES

 

Video:

http://personal.psu.edu/czz111/VAST/VAST2013/psu-zhong-mc3-vedio.wmv

http://personal.psu.edu/czz111/VAST/VAST2013/psu-zhong-mc3-vedio.swf  

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Questions

 

MC3.1 – Provide a timeline (i.e., events organized in chronological order) of the notable events that occur in Big Marketing’s computer networks for the two weeks of supplied data. Use all data at your disposal to identify up to twelve events and describe them to the extent possible.  Your answer should be no more than 1000 words long and may contain up to twelve images.

 

Event 1: Periodic UPnp multicast

Time:  7:00-8:00 from 2013-04-01 to 2013-04-15.

Description:

Beginning on 2013-04-01 7:00 to 8:00, most workstations in the organizational network periodically send a multicast SSDP package to the UPnp multicast address (239.255.255.250:1900). The vulnerability of UPnp made inner workstations (especially those in subnet 172.10.1.*) exposed to the ips outside the network. The periodic multicast is shown  in Figure 1. It is a timeline-based heatmap (called “Timeline-Heatmap”), with time as x-axis and source ip as y-axis. The color of each box refers to the multicast number from the corresponding ip in a certain hour (yellow is 0, black is maximum).

                            

Figure 1. Timeline-Heatmap: Number of multicast packages per hour from inner network ips

说明: 说明: 说明: 说明: E:\VAST\VAST 2013\VAST 2013\2013MC3AnswerSheetandDataDescriptions\index.files\Fig1.png

Scale: yellow =0, black = 200

 

Figure 2. Timeline-Heatmap: Number of connections with 10.199.250.2 per hour

Scale: yellow =0, black = 20

 

Event 2: FTP connections from outside ip 10.199.250.2 to workstations in subnet 172.10.1.* and the administrative workstation

Time:  2013-04-01 8:30 – 16:30; 2013-04-02 10:11, 11:52; 2013-04-03  6:00, 12:23; 2013-04-05 7:24, 10:08.

Description:

After the Upnp multicast (began around 2013-04-01 8:30), workstations in subnet 172.10.1.* notify 10.199.250.2 their information (e.g. configuration) using tcp connections. Then, 10.199.250.2 responded several workstations in subnet 172.10.1.*

Beginning on 2013-04-02 10:00, 10.199.250.2 was able to build FTP connection with (some payload) the administrative workstation (172.10.0.40). These connections are shown in the first row of Figure 2.  The network behavior of the administrative workstation is shown in the Timeline-Heatmap Figure 3. It mainly kept broadcasting to the network. 10.199.250.2 may change the its broadcasting UDP package in order to contact other workstations in 172.10.1.*.

 

Figure 3. Timeline-Heatmap: Number of connections with the administrative workstation (172.10.0.40) in week 1 and week 2

Scale: yellow =0, black = 80

 

Event 3: DC Server’s update

Time: 2013-04-01 8:30,  2013-04-02 6:00,

Description:

In the third Timeline-Heatmap in Figure 4, DC 03 used LLMNR protocol to send name resolution query to 224.0.0.252 around 2013-04-01 8:30.

After this update, DC 03 can be connected by outside hosts 10.6.6.6 and 10.7.7.10. Beginning at 2013-04-02 6:00, all these DCs (DC01, DC02, DC03) began to continuous sent UDP package to the switch 172.0.0.1. Around 2013-04-02 7:00, DC01 sent a UDP package to a fake destination (192.168.3.4).

 

Figure 4. Timeline-Heatmap: Number of connections with DC Server in two weeks

Scale: yellow =0, black = 80

 

Event 4: 10.0.3.77 kept sending mails to Mail Server 01 through the two weeks.

Time: Began at 2013-04-02 8:00, through two weeks.  2013-04-02 10:00

Description:

Figure 4 is the Timeline-Heatmap for 10.0.3.77. Its connection with Mail Server 01 (172.10.0.3) is shown in the first line. Using Excel to open the filtered data, it shows continuous connections via port 25 every 2 or 3 minutes. It could be spam attack. Around 2013-04-02 10:00, BigBrother reported Mail Server 01 was in problematic status.

 

Figure 5. Timeline-Heatmap: Number of connections with the 10.0.3.77 in week 1 and week 2

Scale: yellow =0, black = 100

 

 

Event 5: Large number of connection from multiple outside hosts to Web Server 03. (DOS attack)

Time: 2013-04-03

Description:

2013-04-03 9:00 – 12:00, outside hosts (10.9.81.5) intensively sent connection request to the Web Server 03 (about 100,000 connections per hour). The three-hour connection from 10.9.81.5 to 172.30.0.4 (Web Server 03) is shown in the middle of Figure 5.

Bigbrother report (Figure 6) shows Web Server 03 stopped working after 2013-04-03 12:46. It was restarted on 2013-04-05 8:31.

 

Figure 5. Timeline-Heatmap: Number of connections with the 10.9.81.5 in two weeks

Scale: yellow =0, black = 100000

 

 

Figure 6. The status of Web Server 03 reported by BigBrother

说明: 说明: 说明: 说明: https://lh6.googleusercontent.com/C1KaNTujojLtaQu8JvqwhymOhfNBCUE_MdILGfLulbWHMcoA75qAMW6vQz7tQIvrI9zUEFj9KEVXogTi48BkVkbDzNlBmNYLrFdcj9UFne45VVAGUXWsHskxig

 

 

Event 6: 10.9.81.5 launched port scanning attack to all the Servers in the network.

Time: Began at 2013-04-06 12:00

Description:

Figure 7, from left to right,  shows the timeline that 10.9.81.5 scanned subnet 1, 2, and 3. It shows subnet 1 was scanned at 2013-04-06 12:00  and 2013-04-07 3:00, while subnet 2, and 3 are only scanned at 2013-04-06 12:00.

 

Figure 7. Timeline of number of scanned port of subnet 1, 2, 3 by 10.9.81.5

 

Event 7: Remote Desktop Login

Time: 2013-04-07 11:00

Description:

After 10.9.81.5 did port scanning to the servers, it mounted Remote Desktop Login attack to the webservers: 172.10.0.4, 172.10.0.9, 172.10.0.5, 172.20.0.6, 172.30.0.7. We suspect that the attacker uses remote desktop connections to control these servers. RDP connections can also be observed in week 2. Figure 8 displays the change of the number of RDP connection per hour.

 

Figure 8. RDP Connection Timeline

 

Event 8: Large amount of Denied IPS connection attempts

Time:  begin at 2013-04-11 12:00

Description:

We observe many deny entries in the IPS log in Figure 9. After checking the data in Excel, we conclude that these deny entries record failed port scan activities.

 

Figure 9. IPS Warning(Deny) Entry Timeline

说明: 说明: 说明: https://lh5.googleusercontent.com/fYU4RekA6CNv40I17OrFjyOjDMF6Nx9mT3H0TLwJv-l48qXoknQ6jCzAfzaQwFjm_5Q830px3vBnPvnPUMb0v9CUrVW12YnOdyh-7T7EyOHmAR7rh7wPHFRTmQ

 

Event 9: SSH connections

Time: 2013-04-12-9:00

Description:

Many outbound SSH connections started to appear and they last till the end of week 2. We identified 8 workstations as sources and 1 outside IP address which could be a C&C botnet server. We believe that these workstations are infected. And they are exfiltrating data out to the attacker.

 

Figure 10. SSH Connection Timeline

说明: 说明: 说明: https://lh4.googleusercontent.com/cfsOBgra9Sq1aYdbbULZBpx_IKixA66mTpv-PkgwCJrTThzFf7j1zuKg7QBLx6XfSqwJz7RKOk96t8x0wq4kbU_zv5hhPqYNYNeYEepqihJuianr6oOyKCsL

 

Event 10: The workstations in subnet 172.10.1.* request large payload from outside server 10.1.0.100,

Time: 2013-04-14

Description:

2013-04-13 6:30-7:30 172.10.1.* began to request large payload from outside server 10.1.0.100.

2013-04-13 23:30-2013-04-14 2:00  172.10.1.* continued downloading from 10.1.0.100.

2013-04-14 6:30-7:30  172.10.1.* continued downloading from 10.1.0.100.

2013-04-14 9:00- 10:00  172.10.1.* continued downloading and finished.

 

Event 11: Eight workstations are utilized as bots to launch DDos Attack to outside host/server 10.1.0.100

Time: 2013-04-13 7:00-8:00 2013-04-14 7:00-8:00

Description:

Eight workstations (Shown as the X-axis in Figure 11) has a large number of connections to 10.1.0.100 (small payload) in the two time slots.

 

Figure 11. Timeline-Heatmap: Number of connections to 10.1.0.100 in two weeks

scale: yellow = 0, black = 100,000

 

Event 12:  10.6.6.7 launched a Dos attack to DC servers.

Time: 2013-04-15 9:00-10:00

Description:

The right side of Figure 4. shows that 10.6.6.7 intensively connect to these three DC servers. Considering the payload is not large and the source ports are almost different, it could be a Dos attack.

 

MC3.2 – Speculate on one or more narratives that describe the events on the network. Provide a list of analytic hypotheses and/or unanswered questions about the notable events. In other words, if you were to hand off your timeline to an analyst who will conduct further investigation, what confirmations and/or answers would you like to see in their report back to you? Your answer should be no more than 300 words long and may contain up to three additional images.

 

Narrative 1. Attacker conducted port scan against Big Marketing’s servers and found vulnerabilities. By exploiting these vulnerabilities, attacker was able to use Remote Desktop Protocol (RDP) to login victim servers and install malware. The RDP started from 2013-04-07 11:00. Since these web servers are visited by  Big Marketing’s workstations, implanted malware will spread to workstations. And starting at 2013-04-12-9:00, bots (infected workstations) started using SSH to communication with C&C server in order to receive instructions or exfiltrating sensitive data of the Big Marketing company.

 

Narrative 2. Attacker utilizes UPnP vulnerabilities. Due to the vulnerabilities, some workstations were exposed to the outside internet.  Outside C&C server controlled the workstations and install rootkit on it. After the C&C  got more information of the network, it conduct port scan against servers. The remaining process is similar to the corresponding part of Narrative 1.

 

Narrative 3. Some workstation was infected by drive-by-download attack. Attacker(10.0.3.77) sent spear phishing mails through the SMTP server to Big Marketing’s employees. Some employees read and clicked links in the malicious emails and their workstations were infected by attacker’s malware. The remaining process is similar to the corresponding part of Narrative

 

Hypotheses and Unanswered Questions

1. We highly suspect the RDP connections and SSH connections in the log and consider them as attacker’s activity. But they could be legitimate because the IPS allows these type of traffic. So the analyst should further investigate these connections and see whether they are malicious.

 

2. We found that Bigbrother report (Figure 6) shows Web Server 03 stopped working after 2013-04-03 12:46. It was restarted on 2013-04-05 8:31. Was this server closed by the administrator?

3. We mentioned there are eight victims in Event 11. Who are they? What asset do they have? Are they key node in the network? The analyst need to look into the configuration.

 

4. Is 10.1.0.100 a malicious server or a legitimate client of BigMarketing? Why the inner workstations receive large payload from it? The analyst need to gather and analyze the data package information.

 

MC3.3 – Describe the role that your visual analytics played in enabling discovery of the notable events in MC3.1. Describe whether your visual analytics play a role in formulating the questions in MC3.2. Your answer should be no more than 300 words long and may contain up to three additional images.

 

We can say that our visual analytics is the visualized analytical reasoning process recorded in Mind42 (called AOH-Map).

 

Figure 12. Partial AOH-Map. Public link: http://mind42.com/public/431032f5-4a12-4fd8-ad93-de9462a463fa.

说明: 说明: 说明: https://lh6.googleusercontent.com/ehNvjEVoVhVGoc0iCdw5AW4iGjNQPAsWEk3-_PkaKbJRQjV7EtdEn2jPGi4zw3rBuq7eriW7DMADCJViWN3qiVT4TRVFsrSPhsetktVYdTGMjtB2cvUJvyfV

 

There are several advantages of visualizing the analysis process:

Firstly, it’s convenient for us to introduce new ideas to each other and share new findings by recording our analysis process with the same representation.  We represent our analytical process as an iterative cycle involving three components: action, observation and hypothesis. A new hypothesis is generated based on an existing observation. In order to verify the hypothesis, further actions need to be done and will lead to new observations.  Reflected in the AOH-Map, it’s a tree structure.

 

Secondly, it enables us to divide our work based on the “hypothesis” and conduct hypothesis-based collaboration. More specifically, each of us create new hypothesis based on existing observations. In order to verify the hypotheses, further actions are needed. Each of us choose a hypothesis to work on. Then, we record our actions and new observations. Based on the new observations, new hypotheses could be generated. According to our practice, the hypothesis-based collaboration is very effective.

 

Therefore, to answer M3.3,  each notable events are summarized based on the observations we recorded during the analysis process. The answer for MC3.2 is also directly reflected by the hypothesis in the AOH-Map.  Since each action is aimed to verify a hypothesis, any visualization we developed is driven by a specific goal. Thus, we propose the “visualization function” idea: instead of developing or leveraging a complex and integrated visualization tool, we only have these refined functions:

• (PERL FILTER_IP, datasource, ip)                                // perl script to filter big data

• (PERL COUNT_HOURLY, datasource, field)           // perl script to aggregate big data

• (EXCEL HISTOGRAM field)                                            // draw excel histogram

• (EXCEL FILTER field)                                                         // use excel filter

• (EXCEL SORT field)                                                           // use excel to sort

• (BIGQUERY table sql)                                                      // run SQL query on Google’s BigQuery

• (BIGQUERY GOOGLEVIZ sql, chartscript)                // Google script that utilize GoogleViz, (e.g. Figure 7)

• (D3-TIME-HEATMAP, ip, field)                                   // javascript using D3 package to draw Timeline-Heatmap for time series analysis (e.g.                   Figure 1)

Although each “visualization function” is simple, it becomes convenient for us to use various combinations of them according to our particular needs. Moreover, the existing “visualization function” is also an intermediate result of the analysis procdess, because others can reuse it by applying it to another data set. Our practice at this time shows that the current eight functions can give us powerful support for our analysis.